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1. 


INTRODUCTION 


In  a  recent  paper  the  author  showed  that  contrary  to  popular 
opinion,  strict  Frechet  differentiability  of  the  class  of  M-functionals 
is  frequently  possible.  A  necessary  requirement  for  existence  of  the 
Frechet  derivative  is  that  the  defining  psi  function  is  uniformly 
bounded,  and  this  naturally  excludes  those  nonrobust  estimators 
such  as  the  maximum  likelihood  estimator  in  normal  parametric  models. 

On  the  other  hand,  in  that  paper,  smoothness  assumptions  were  imposed 
on  the  defining  psi  function  which  are  not  appropriate  for  many  common 
robust  proposals  in  M-estimation  theory,  such  as  Huber 's(1964)  minimax 
solution  and  Hampel *s (1974)  three  part  redescender  used  in  estimating 
location.  A  host  of  robust  solutions  for  more  general  parametric 
families  are  obtained  through  Hampel's (1968)  lemma  5,  and  generalizations 
of  it  (cf .  Hampel  1978) ,  and  these  almost  invariably  are  functions  with 
"sharp  corners".  Indeed  the  problem  that  is  presented  by  failure  of 
psi  functions  to  have  continuous  partial  derivatives  has  been  the 
focus  of  papers  by  Huber(1967),  Carroll(1978)  with  respect  to  proofs 
of  asymptotic  normality.  While  Frechet  differentiability  of  the  M-functional 
apriori  gives  asymptotic  normality  of  the  M-estimator,  at  least  for 
real  valued  observation  spaces,  it  also  gives  a  direct  expansion 
by  which  the  degree  of  robustness  can  be  directly  measured  through  the 
gross  error  sensitivity.  The  latter  quantity  is  the  supremum  of  the 
absolute  value  of  the  influence  curve  of  Hampel (1968, 1974) ,  and  Huber (1977) 
assuming  existence  of  the  Frechet  derivative  shows  that  the  maximum 
asymptotic  bias  in  contaminated  neighbourhoods  of  a  parametric  distribution 
is  proportional  to  the  gross  error  sensitivity.  Subsequently  Frechet 
differentiability  of  a  statistical  functional  is  an  important  tool 
in  the  robust  description  of  an  estimator,  and  complements  the  definition 
of  a  robust  functional  as  one  that  is  weakly  continuous  (cf.  Hampel  1971). 


In  this  paper  the  methods  of  nonsmooth  analysis  ,  described  in  the  book 
by  F.H.  Clarke(1983) ,  are  introduced  to  the  theory  of  statistical  expansions, 
and  are  useo  here  in  the  proofs  of  weak  continuity  and  FrSchet  differentia¬ 
bility  of  M-functionals.  Subsequently  the  conditions  for  Frechet 
differentiability  given  in  Clarke(1983)  can  be  relaxed  to  include  most 
popular  M-functionals. 

The  M-estimator  is  a  solution  of  equations 

[  -Kx,x)  dF  (x)  =  0  ,  (1.1) 

J  R 

where  F  is  that  distribution  which  attributes  atomic  mass  1/n  to  each 
n 

of  n  independent  identically  distributed  observations  X. . . Xn)  having 

common  distribution  F£G  ,  the  space  of  probability  distributions 
defined  on  some  separable  metrizeable  observation  space  R.  For  the 
applications  in  this  paper  it  is  only  necessary  to  consider  R  =  E, 
the  real  line.  The  parameter  t&Q,  an  open  subset  of  Euclidean 
r-space  E  ,  and  F  -  { :  reQ)  is  a  parametric  family  where  the 
usual  assumption  is  that  F  =  F  for  some  0eO.  The  function  i|>:  Rx0  -*■  Er 
can  be  defined  through  minimization  of  some  loss  function,  or  obtained 
by  some  other  optimal  criteria..  The  theory  of  robustness  makes  use  of 
the  M-functional  T  defined  on  G,  so  that  more  generally  T[G]  is  a 
solution  of  equations 

K  (x3  =  f  ^X’T)  dGW  =  0  Cl  -  2) 

G 

if  a  solution  exists,  T[G]  =  °°  otherwise.  Thus  the  estimator  is 
given  by  the  functional  T  evaluated  at  Fn  ,  and  its  asymptotic  properties 
follow  from  continuity  and  differentiability  of  T  at  F  with  respect  to 
suitable  metrics  defined  on  G  .  This  approach  to  asymptotic  theory  for 
statistics  was  first  considered  by  Von  Mises(1947) . 


To  avoid  ambiguity,  and  also  for  good  statistical  practice,  the 
concept  of  a  selection  functional  p  was  introduced  by  Clarke(1983) , 
in  order  to  identify  in  the  event  of  several  solutions  of  the  equations  (1.2), 
that  root  which  is  to  be  the  estimator.  That  is,  the  M-functional 
is  defined  by  i|i,p  so  that 

I(<|j,G)  =  p(G> 

where 

I(4>»G)  =  (t  |  ’Hx.t)  dG(x)  =  0,  i£0} 

R 

,if  a  solution  exists.  Otherwise  T[i/»,p,G]  =  «.  The  functional  T 
is  then  Frechet  differentiable  at  F  with  respect  to  the  pair  (G,d*) , 
for  suitable  metrics  d*  on  G,  if  T  can  be  approximated  by  a  linear 

I 

functional  T  which  is  defined  on  the  linear  space  spanned  by  the 

r 

differences  G  -  H  of  members  of  G,  so  that 

|  T[G]  -  T[F]  -  Tp(G  -  F)  |  =  o(d*(G,F))  (1.3) 

as  d^(G,F)  -*■  0,  G€.G.  Essentially  the  expansion  for  Frechet  differen¬ 

tiability  is  dependent  on  a  local  expansion  of  equations (1 .2) ,  and 
a  robust  selection  functional  will  automatically  select  the  Frechet 
differentiable  root,  whenever  one  exists.  To  the  latter  end  one  uses 
an  auxilliary  functional  p(G,x)  =  jx  -  e|  to  prove  existence  of  a  unique 
Frechet  differentiable  root  in  a  local  neighbourhood  of  the  parameter  0 
when  considering  the  derivative  at  F  .  Also  it  Is  sufficient  to  consider 
the  expansion  (1.3)  for  T  defined  on  G,  and  the  usual  mathematical 
extension  of  the  domain  of  T  to  the  linear  space  of  signed  measures  is 
of  little  importance  here. 

The  Frechet  derivative  may  be  considered  strong  in  the  sense 
that  existence  of  the  Frechet  derivative  for  statistical  functionals 
implies  existence  of  the  weaker  Hadamard  or  compact  derivatives  of 
Reeds(1976) ,  Fernholz ( 1983) ,  and  the  Gateaux  derivative  discussed  by 
Kallianpur(1963) ,  a  special  case  of  which  is  the  influence  curve 


;  here  6^  is  the  distribution  attributing  mass  1  to  the  point  x. 

The  Gateaux  derivative  is  given  by 
■ 

IC(x,F,T)  d(G-F)Cx), 

which  coincides  with  the  Fr&chet  derivative  when  the  latter  exists. 
Unfortunately  comments  by  Kallianpur(1963)  which  were  in  specific 
relation  to  the  maximum  likelihood  estimator  (rale)  led  other  researchers 
to  believe  the  derivative  too  strong  to  obtain.  Indeed  Huber(1981) 

ff 

states  Unfortunately  the  concept  of  FrSchet  differentiability  appears 
too  strong  :  in  too  many  cases,  the  Frechet  derivative  does  not  exist, 

If 

and  even  if  it  does,  the  fact  is  difficult  to  establish. 

In  Clarke(1983)  simple  conditions  for  Frechet  differentiability  of 
M-functionals  were  given  together  with  a  counterexample  to  the  comments 
of  Kallianpur. 

Boos  and  Serf ling(1980)  introduce  the  related  notion  of  a 
quasi -differential  which  assumes  the  same  expansion  (1.3),  but 
restricts  G=F^  and  allows  for  small  order  errors  in  probability 
with  respect  to  the  Kolmogorov  distance  between  Fn  and  F.  This 
expansion  does  not  offer  the  same  properties  of  robust  description 
of  the  estimating  functional,  and  even  the  mean  functional  satisfies 
this  stochastic  form  of  differentiability.  Beran(1977)  also  adopts 
a  differential  approach  using  the  Hel linger  metric,  though  this  appears 
to  be  for  more  specific  application. 

A  weaker  set  of  conditions  than  conditions  A  of  Clarke(1983) 
are  introduced  in  section  2,  though  for  smooth  psi  functions  conditions 
A  of  that  paper  are  easier  to  apply.  Theorem  2.1  of  this  paper  is 

I 

necessary  to  show  condition  A^  t  introduced  here,  holds  for  the  popular 
nonsmooth  psi  functions.  It  can  be  considered  as  a  variation 
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or  a  generalization  of  the  Glivenko  Cantelli  result.  Conditions  A' 
are  used  in  sections  3  and  4  in  the  theorems  that  give  existence  of  a 
unique  continuous  and  Rochet  differentiable  root  of  equations  (1.2). 

In  particular  the  arguments  for  weak  continuity  follow  when  either  of 
Ldvy  or  Prokhorov  metrics  are  used.  Important  examples  of  application 
are  given  in  section  5,  together  with  the  conclusion. 


2.  A  DISCUSSION  OF  DEFINITIONS  AND  CONDITIONS  A' 

Suppose  f  maps  Er  to  itself  and  6  is  a  point  near  which 
f  is  Lipschitz.  Denote  to  be  the  set  of  points  at  which  f  fails 

to  be  differentiable,  which  by  Radermachers  theorem  is  known  to  be  a 
set  of  Lebesgue  measure  zero.  Let  Jf(t)  be  the  usual  r  x  r  matrix 
of  partial  derivatives  whenever  x  . 

Definition  2.1:  The  generalized  Jacobian  of  f  at  0  denoted  by 

3f  (9)  ,  is  the  convex  hull  of  all  r  x  r  matrices  Z  obtained  as 

the  limit  of  a  sequence  of  the  form  Jf(x^)  where  x^  ■+■  0  and  . 

The  generalized  Jacobian  3f(9)  is  said  to  be  of  maximal  rank 
provided  every  matrix  in  3f(9)  is  of  maximal  rank  (i.e.  nonsingular). 

The  following  proposition  is  proved  on  page  71  of  F.H.  Clarke  (1983). 

Proposition  2.1:  The  generalized  Jacobian  3f(0)  is  upper  semicontinuous , 
which  means,  given  e  >  0  there  exists  a  <5  >  0  such  that  for  x  £=1^(9)  > 
the  open  ball  of  radius  <5  centered  at  0  , 

3 f ( x)  c  3f(0)  *  c  Brxr  . 

Here  B  ^  ^he  unit  ball  of  matrices  for  which  B  ^B^^  implies 

i|B||  ^  1  • 


■Art 


VV  ,v  ■ 


•  <*. 
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Remark  2.1:  Without  loss  of  generality  we  can  assume  ||B||  to  be  the 
least  upper  bound  of  |By(  where  |y|  <  1  . 

Frequently  several  solutions  of  equations  (1.1),  (1.2)  can  exist 
whereupon  a  robust  selection  of  the  functional  root  is  obtained  using  the 
idea  of  a  selection  functional  p  introduced  in  Clarke  (1983).  The 
robust  selection  functional  retains  the  continuity  properties  of  the  selected 
root  in  small  enough  neighbourhoods  n(t,F)  c  G  of  a  distribution  F  , 

which  can  be  considered  here  to  be  defined  by  metrices  d*  .  The  M-functional 
is  then  defined  by  ip  and  p  as  T[ij),p,.]  .  Typical  choices  for  d* 


include  d^,  d^ 


d 

P 


the  Kolmogorov,  Ldvy  and  Prokhorov  metrics  respectively. 


Conditions  A': 

AJ  :  TO,p,F0]  =  0  , 

A'  :  iHx.t)  is  an  r  x  1  vector  function  on  R  x  0  which 

is  continuous  and  bounded  on  R  x  D  where  D  c  Q  is  some 
nondegenerate  compact  interval  containing  0  in  its  interior, 
and  R  is  some  separable  metrizable  space 


A.J  :  <Hx,t)  is  locally  Lipschitz  in  x  about  0  in  the  sense 
that  for  some  constant  a 


lKx,x)  -  <Kx,0)  I  <  |t  -  9 1 


uniformly  in  x  ^  R  and  for  all  x  in  a  neighbourhood  of  0 


A'  :  Letting  differentiation  be  with  respect  to  the  argument  in 

parentheses  3K_  (x)  is  of  maximal  rank  at  x  =  0 

0 

AJ  :  Given  6  >  0  there  exists  an  e  >  0  such  that  for  all  Ge.n(e,F0) 

SupxeD  IVx)  ’  KF  (t)I  C  5 
0 

and 

9Kg(x)  c  9 Kp  (x)  +  5  Br>^r 


uniformly  in  x  s  D  . 


Remark  2.2:  A '  =  A„ 

-  Q  Q 

Remark  2.3:  For  a  function  satisfying  A'  it  follows  from  remark 
2.2  and  theorem  6.1  in  Clarke  (1983)  that  given  6  >  0  there  exists  an 


e  >  0  such  that  for  all  Gen(e,F  ) 

0 

SupxeD  !KG(t)  ‘  KF  (T)i  <  6  * 

0 

whenever  n(e,Fg)  is  generated  by  metrics  d^, 


This  establishes  the  first  part  of  condition  A^  . 


Remark  2. $•:  If  K_  (x)  is  continuously  differentiable  in  x  at  0 

0 

then  A3  =  A'  ,  where  condition  A3  is  that  of  Clarke  (1983) . 

Conditions  A'  -  A^  can  be  considered  fairly  straightforward,  whereas 
the  condition  A^  is  not  so  obvious.  When  R  =  E  ,  the  real  line,  it  can 
be  shown  to  be  a  consequence  of  the  following  theorem,  a  proof  of  which  is 
detailed  in  the  appendix.  It  is  sufficient  here  to  establish  the  result 
for  the  Kolmogorov  distance  d^  . 


Theorem  2.1  :  Let  A  be  a  class  of  continuous  functions  defined  on  E 
with  the  following  properties:  (1)  A  is  uniformly  hounded,  that  is, 
there  exists  a  constant  H  such  that  |f(x)  |  <  H  <  00  for  all  f  eA 
and  x  sE  ;  and  (2)  A  is  equicontinuous .  Let  F0  e  G  be  given. 
Then, 

for  every  6  >  0  there  is  an  e  >  0  such  that  d^_(Fg,G)  <  e 
imp  lies 

■  ■ 

suPf«A  Supx«Eu{ +»}  !jixfdG  =  J ixfdFe‘  <  6  ’  (2 

where  integration  is  performed  over  the  intervals  Ix  which  can  be 
either  open  or  closed  of  the  form  (-*>,x)  or  (-°°,x] 


Remark  2.5:  A  similar  proof  yields  the  same  result  with  replaced 

by  d,  .  In  some  instances  Frdchet  differentiability  with  respect  to 

K 

d,  implies  that  with  respect  to  d.  ,  d  following  (6.2)  of  Clarke  (1983). 

K  L  p 

Consider  <p  with  continuous  partial  derivatives  bar  on  a  finite 
set  of  points  S(x)  .  From  F.H.  Clarke  (1983  pp.  75-83)  it  follows  that 
3Kg(t)  =  3  iHy,t)dG(y)  c  3^(y,x)dG(y)  ,  (2.2) 

from  which  the  right  hand  side  can  be  expanded  to  a  finite  summation 
m 

l  fi(y,t)dG(y)  +  l  3*(x,t)G(x}  . 

j=l  \  J  xeS(x) 

j 

Here  f.  eA  and  — •  (y,x)  =  f.(y,x)  on  the  connected  interval  I.  , 

J  3t  J  J 

for  j  =  l,...m  .  Since  p  is  Lipschitz  in  x  and  3iKx,x)  bounded, 
theorem  2.1  implies  condition  . 

3.  UNIQUENESS  OF  FUNCTIONAL  SOLUTIONS  TO  EQUATIONS 

For  those  psi  functions  which  do  not  admit  a  unique  root  of 
the  equations,  at  least  a  unique  root  of  the  equations  in  a  local  region 
of  the  parameter  space  about  0  can  be  shown  to  exist  for  small  enough 
neighbourhoods  of  F  .  If  conditions  A'  are  with  respect  to  Ldvy 
or  Prokhorov  neighbourhoods,  existence  of  a  weakly  continuous  root  is  shown, 
for  which  the  global  argument  of  Clarke  (1983)  can  be  used  to  select  it  if 
more  than  one  root  exists.  When  the  Kolmogorov  distance  is  used  only 
consistency  is  directly  established. 

The  following  propositions  are  established  on  pp. 252-255  of  F.H.  Clarke 
(1983),  and  obviate  the  condition  of  continuous  derivatives  in  the  argument 
for  the  inverse  function  theorem. 


TT 
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Proposition  3.1:  Suppose  f  satisfies  properties  described  in 
Section  2  and 

4Af  ~  inf3f(0)  HMC0’fhl  - 

where  the  infimum  is  taken  over  all  matrices  M(0,f)  &  3f(0)  ,  and  for 
some  6  >  O,xeU^(0)  implies 
2Af  <  inf3f(t)  ||M(t,f)||  . 

Then  for  arbitrary  ri  ,  t2  ^  U^(0)  ,  the  closure  of  the  ball  1)^(9)  , 

|f(tl)  -  f(T2)  |  >  2Af  I  T 1  -1 2  |  • 

Proposition  3.2:  Under  the  conditions  of  Proposition  2.1 

f(Ut-(9))  contains  U.  _(f(0))  - 
o  A  pO 

*■  -1 

Remark  3.1:  For  vs  U  C  9 ) )  we  can  define  f  (v)  to  be  the 

A  ^0 

unique  x  gU,  x(9)  such  that  f(x)  =  v  and  Proposition  3.1  implies 
f_1  is  Lipschitz  with  Lipschitz  constant  1/(2A^.)  . 

Lemma  3.1:  Let  conditions  A'  hold  for  some  <p,p  .  Then  there  is  a 
>  0  and  an  ej  >  0  such  that  for  all  G  sn(ci,Fg)  any  matrix 
M(x ,G)  s  3Kg(t) 

•will  satisfy  ||M(x,G)||  >  2A  where  A  is  defined  to  be  a  value  for  which 

M(9  ,Fg)  e  3  Kp  (9)  implies  ||M(0,F6)|j>  4A  . 

0 


Remark  5.2:  If  K_  (i)  is  continuously  differentiable  in  t  then 

—  3 

the  choice  of  A  =  1/(4  ||M(9,F  )  ||  )  satisies  the  criterion  of  Lemma  3.1. 

0 


Proof  of  Lemma  3.1:  Since 

by  Proposition  2.1  6;  >  0 

x  eO.  (9)  .  By  condition 

5  1 

CSn(i|,F„)  impl  ies 


)K  is  upper  semicontinuous ,  choose 
F9 

such  that  3K  (x)  c  3K„  (9)  +  AB  v  whenever 

r  r  -p  X  T* 

0  0 

A'  there  exists  an  ei  >  0  such  that 

4  l 


3K_ ( x)  c  3 Kr  (x)  +  AB  uniformly  in  x  e.  D  . 
G  F^  rxr 


.’  \  v- 
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Proof  of  Theorem  2.1:  Given  6  >  o  choose  c  >  o  so  that 
F0{E  -  C}  <  6/ (8HJ 

For  any  x  e  (-°°,-c]  and  G  within  Kolmogorov  distance  6/(8H)  from  F 


fdG  -  J  fdF0  <  H(G{Ix)  +  Fq{ Ixl) 

!x  *x 

<  H(G{E  -  C}  +  F  { E  -  C}) 

U 


<  6/2 

Let  e'  be  given  by  Lemma  6.2  for  the  choice  of  n  =  6/2. 

Choose  e  =  min{£ ' ,  5/(8H)}.  Then  for  arbitrary  x  >  c  and  G  within 
Kolmogorov  distance  £  from  F 


fdG  -  fdFj  <  H(G(E  -  C}  +  Fq{E  -  C}) 

I  I  •  f  f 

x  +  sup  r  fdG  -  fdF. 

Jcnly  Jcmy  8 


<  6/2  +  6/2 


by  Lemma  6 . 2 


Hence 


dk(G,F0)  <  £  implies 


suPfeA  suPxeE  |JI  fdG  "  j fdFe |  "  6 
x 


-V 


-  t:-  %  ■  r  ■  t  r  r  ■'  v  *  v  ■ 
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In  particular  we  can  choose  e*  such  that  dk(G,FQ)  <  e*  implies 

G(x^J  -  G(Xij  l)  <  n/C 4H),  for  j  =  1,  •  ••>  ni 

i  =  1,  •••>  k • 

Let  G*  be  the  corresponding  improper  measure  constructed  from  G. 
Then 


supf6A  supx*C 


fdG* 


Cnl 


Lnl. 


fdG 


"  4*1  ' 


x  x 

It  is  now  convenient  to  consider  case  (b)  first 


supfeA 


fdG*  - 


Cnl 


fdF, 


Cnl 


(♦) 


<  II 


<  H 


l 

j-1 

rv 

I 

3=1 


d(G-F0) 


-  I 

3=1 

n' 


d(G-F  )  +  l 

fbj.i.bj)  ‘  ^=1 


G{b.}  -  F0{b.} 


G(b . }  -  F.{b  } 
3  9  3 


(*) 


Choose  o  <  e*  <  e*  such  that  dk(G,F0)  <  e’  implies  (*)  <  n/4 . 

There  are  two  possibilities  for  case  (a) .  Either 

b.  <  x  <  p.  for  some 
i  i 


o  <  i^  s  n '  -  1 , 


whence 


O)  <  (*)  +  supf€A 


(b.  c]nl 
1  .  x 

X 


fd(G-F*) 


=  (*)  <  n/4. 


or 


n  <  x  <  b  I*  o  s  i  sn  —1, 

p  “  A  _  i  +1’  X 

X  X 

for  which 


(+)  <  (*)  +  H 


Gd-iCP  -  >  -  fA\i>  *  Fe‘bi  1 

X  X  ^ 

For  either  case  if  it  happens  that  dk(G,F0)  <  e',  then 
sup f«A  SUpx«C 


<  n/2. 


fdG*  -  I  fdF*  |  <  n/2. 
Cnlx  JCnIx 


Then  the  lemma  follows. 


V  .■  V  .•  v  '  -  •;*  ^*v%  \> \NV* .  -  w *-  ‘ 


.  *V.\.  , 


•1 
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n  t 

That  is,  no  further  partitioning  is  necessary.  Let  {b^}"_o  be  the 
set  of  points  that  partition  (-c,c]  formed  by  combining  { xi j  ’ 
i  =  1,  k.  Denote  F*  the  possibly  improper  distribution  that 


attributes  weight  FQ(b^)  -  FQ(b7)  to  the  points  b^,  and  weight 
F0(b7)  -  F0(bil)  to  the  points  pi  =  -^0^  +  bi+1) ,  i  =  1,  n'  -  1 


Suppose  x€C  is  given.  Then  either:  (a)  there  exists  an  o  s  i  <  n' 

such  that  b-  <  x  <  b.  or  (b)  there  exists  an  1  <  i  <  n1  for 

1  1  +1 9  v  x 

XX  A 

which  x  =  b.  . 

1x 

For  case  (a)  and  feA 

Lx 

Jem  fdF*  -  cm  fdFe  s  l  fh  h  , 

'  X  X  J  =  l;  (b.  ,  b.)  J 

j  >  J 

*  |fCb.  ,c]„IxfWd‘F*  -  F«)'F> 

5  WC1  *  2HlFe(\:  .i>  -  F0(bi  >> 

X  X 

<  n/4  +  n/2  = 

For  case  (b)  where  x  =  b.  for  some  1  s  i  <  n' 

i  x 

x 

f  f  Xx  r 

fdF*  -  fdF  <  l  f(p  )  -  f(y)  dF  (y) 

JCnI  •’CnI  j  =  l  (b.  ,  ,b.)  J 

x  x  J  J -1  J 

<  I-  F0tC}  s  n/4 

Hence 

suPfeA  SUFxeC  | jen I  FdF*  *  fcnl  fdFe  [  <  I" 

x  x 

This  is  true  for  any  distribution  satisfying  the  inequalities  (6.2). 
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But  this  contradicts  the  initial  assumption.  Now  since 
G(yT)  *  G(a)  +  j^G(b')  -  6(a))  j  «  1,  ....  k, 

then 

G(yT)  -  G(y  )  <  £{G(b‘)  -  6(a))  <  n 
Note  that  yQ  =  a,  y1  >  a,  and  if  y^  <  b,  then  G(b')  -  GCy^)  =  0. 
Let 


be  the  partition  formed  from 


=  b 

{yj}j=1  u  {b} 


Lemma  6.2:  Let  F.  be  given.  Also  for  given  c  >  o  let  C  =  (-c,  c], 

-  y 

Then  V  n  >  o  3  e '  >0  sucb  that  d^(G,F0)  s  e'  implies 

/* 

suPf«A  suPx«c  cm  f^dGty)  -  cm  f(y)dFe(y)  <  n  C6.1) 

X  X 

where  intervals  I  may  represent  either  open  or  closed  intervals  from 
-0°  to  x. 

I 

Proof :  Given  n  >  o,  let  {d^}^_j  be  the  at  most  finite  set  of  points 
in  C  sue  -  <-hat  FQ(d.)  -  FQ(d.)  ^  n/(16H) ,  if  they  exist.  Since  the 
family  A  is  equicontinuous  and  C  ,  the  closure  of  C,  is  compact,  we 
may  choose  a  decomposition 


-c  =  a0  <  a!  <  .  "  am  =  c 

so  that  a^  ^  <  x  <  y  s  a^  implies  |f(x)  -  f(y)  |  <  n/4,  for  every  feA, 

if  k 

and  i  =  1,  . ,  m.  Let  {a^h_Q  be  the  further  decomposition  obtained 

by  combining  the  points  {a^}™_o  anc*  {d^}f_^,  so  tbat  a*  l  <  ai  * 

i  =  1,  ...,  k.  From  Lemma  6.1  whenever  FQ(a*  )  -  F0(a*  ].)  >  n / C 4H) 

there  exists  a  finite  decomposition  {x..}ni  so  that 

v  1J  j=o 

*  * 
a.  ,=x.  <  x. .<...<  x.  =a. 

l-l  10  ll  in^  i 

for  which 


If 


F9(xi'j>  -  Pe^iCj-l)3  "  n/(4H)  j  = 

Fg (a* ")  -  FgCaJj)  <  n/(4H) ,  set  ^  =  1 ,  xiQ 
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6.  APPENDIX 

The  proof  of  theorem  2.1  is  preceded  by  some  necessary  lemmas. 
The  notational  abbreviation  G(x  )  =  lim  h+o  G(x  -  h)  is  used. 

Lemma  1:  Let  G(x)  be  any  distribution  function  for  which  G(x)  - 
n/4  for  x  £~(a,b) ,  where  a  <  b  real,  and  n  >  0  are  given. 
G(b')  -  G(a)  >  n,  then  there  exists  a  finite  partition 

a  =  x0  <  X!  <  .  <  =  b, 

so  that 

G(x~)  -  Gtx^p  <  n,  j=l, . . .  ,k’  . 

Proof:  Define  G_1(t)  =  inf  (x|  G(x)  >  t,  x  s  [a,b]} 

Since  G  is  right  continuous  G(G  x(t))  ^  t,  choose 
y.  =  G'1  (G(a)  +  ^(G(b")  -  G(a))}, 
where  k  >  1  is  chosen  so  that 

G(b~)  -  G(a)  <  k  <  2(G(b")  -  G(a)) 

n  n 

Then 

G ( y j )  -  G C y j  ^ 3  ^  G(a)  +  i|-(G(b  )  “  G(a))  -  G(yj_^) 

2  G(a)  ♦  j<G(b‘)  -  G(a) ) 

-  {G(a)  +  ^-(G(b')  -  G(a) )  ♦  n/4} 

=  £(G(b')  -  G(a) )  -  n/4 

*  n/4 

If  Yj  €(a,b),  j  =  l , . . .  ,k ,  then  y.  >  y._r  For  if  y.  =  y 

then 

G(y.)  -  GCyJ)  =  G(Xj3  -  GCyT^j 
2  G(y.)  -  GCyj_1) 

^  n/4 


G(x  )  < 
If 


The  problems  induced  by  nonsmooth  psi  functions  are  not 
unique  to  proofs  of  Frechet  differentiability,  and  are  applicable 
to  many  asymptotic  proofs.  More  frequently  it  is  the  case,  that 
rather  than  consider  the  difficulties,  the  appropriate  smoothness 
assumptions  are  made  in  the  proofs,  but  somehow  the  results  are 
expected  to  be  applicable  to  those  continuous  but  nonsmooth  functions  also. 
Nonsmooth  analysis  can  the  be  considered  as  one  possible  avenue 
of  justifying  such  an  approach. 
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With  the  choice  of  selection  functional  p(G,i)  =  |t  -  G~*(-)|, 

2 

whereby  the  root  closest  to  the  median  is  selected,  the  functional 

T[4i  u  »P**1  is  Frechet  differentiable  at  $(^Jl£l). 
a,u,c  02 

In  a  sense  weak  continuity  and  Frechet  differentiability  of  the 

functional  at  the  empirical  distribution  function  are  also  important. 

Weak  continuity  at  F^  indicates  stability  of  the  estimate  in  the 

presence  of  rounding  errors  in  the  recording  of  observations, 

and  at  least  for  sufficiently  large  n  the  effects  of  gross  errors 

can  be  considered  blunted.  Frechet  differentiability  at  Fn  on  the 

other  hand, could  be  used  to  justify  asymptotics  involved  in  Edgeworth 

type  expansions  and  bootstrapping,  for  example  as  considered  in 

Hampel (1982) ,  Beran(1982).  When  the  psi  function  is  smooth,  the 

only  change  to  the  arguments  of  Clarke(1983)  forFrechet  differentiability 

at  F  ,  is  to  replace  F.  by  F  in  conditions  A, -A., 
n  r  0  n  14 

Similarly  the  same  substitution  of  conditions  can  be  made  in 

the  results  of  this  paper,  however  if  it  should  occur  that  an  observation 

X  falls  exactly  at  the  point  where  iJ;(X,t)  does  not  have  a  continuous 

partial  derivative  at  x  =  T [ F^]  then  the  generalized  gradient 

3Kp  (T[F  ])  does  not  reduce  to  a  single  matrix.  Even  though  such 
n 

an  event  would  occur  with  probability  zero  in  most  forseeable  examples 
in  which  the  underlying  distribution  was  absolutely  continuous,  it  can 
be  said  nevertheless  that  the  proof  used  in  theorem  4.1  does  not  follow 
through.  In  this  instance  the  question  of  whether  T  is  Frechet 
differentiable  at  F^  is  then  left  open.  At  least  in  the  domain  of 
M-functionals  defined  through  (1.2),  it  can  be  concluded  that  Huber ' s(1981) 
remarks  should  not  be  interpreted  in  the  sense  that  Frechet  differentiability 
is  too  strong.  This  is  only  the  case  for  nonrobust  M-functionals, 
and  consequently  we  should  consider  Frechet  differentiability  an  advantage. 
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A'  follows  by  Theorem  2.1  and  because  3(J»(i1-kT2,T)  and 
3iji(Ti+kT2 ,t)  are  bounded.  Assumption  (4.1)  holds  for  the 
Kolmogorov  distance  through  integration  by  parts  and  noting  that  ip 
is  a  function  of  total  bounded  variation.  Thus  by  Theorems  3.1 
and  4.1  there  exists  a  root  that  is  Frdchet  differentiable  at 


F0  =  $ 


x-9i 


0; 


with  respect  to  d^  .  Since  4>  has  a  bounded  density 


T  is  Frdchet  differentiable  with  respect  to  dL  ,  d^  also. 
Consequently,  the  infinitesimal  robustness  of  this  M-estimator 
at  the  normal  parametric  distribution  is  evident  through  Frdchet 
differentiability.  It  is  also  Frdchet  differentiable  at  the 


distribution  Fq 


X-0j 


0; 


f()(x)  =  e 

/2tt 
(1-0 


for  which  the  density  function  of  Fo  is 
x2 

2  for  | x |  s  k 


| 


/2tT 


for  lx  >  k 


with  k  and  €  connected  through 
2(Kk)  _  2  $(-k)  =  — 


k  —  v  1-e 

(4>  =  <t> '  being  the  standard  normal  density).  Then  the  M-estimator 
coincides  with  the  mle,  and  provides  another  example  of  a  robust 
and  asymptotically  efficient  estimator. 

Examples  where  multiple  roots  of  the  equations  exist  include 
Hampel's  3-part  redescender  M-estimator  for  location  dependent  on 
three  parameters  a,b,c; 


a,b,c 


(x)  = 


X 

1*1  *  a 

a  sign(x) 

a  <  |  x  |  <  b 

c  -  1  X  1  .  ,  , 

a  —LX  sign(x) 

b  <  |x|  <  c 

c  <  X 


0 


MU)  = 


'l»1'(y)cM(y) 


(5.1) 


* 

y^Cy)df(y) 


Condition  Aj  follows  since  E^[i|»]  =  0  .  A'  ,  hold  by 
inspection,  and  A'  holds  since  M(6)  is  nonsingular.  Remark  2.2 
suffices  for  the  first  part  of  A'  .  To  apply  theorem  2.1  consider  the 


function 


f(x,x)  =  I  (x) 

(T1-kx2,T1+kT2) 


“ (X~T  x ) 


‘  (X—T  j )  -(X-Tj)' 


.  i_  ft  C*)  +  I  W 

t2  [i(-0Q,ti-kx2]  [t!+kx2,“) 


It  is  clear  that  A  =  { f ( . ,x) :x  <S-D)  forms  a  bounded  equicontinuous 
class  of  functions  on  E  .  Also 

3K„ (x)  =  f  f(x,x)dG(x) 

G  ~  J (x!-kx2,ti+kx2) 

+  3^(xi-kx2,x)G{xi-kx2)  +  3i|»(t  i+kx2  ,l)G{xi+kx2}  , 
where  differentiation  of  <P  is  with  respect  to  the  second  argument,  while 


3KC  (x)  =  f(x,x)dF  (x)  ,  where  F  (x)  = *  l  02 

F0  ~  (x i-kx2 ,t i+kx2) 
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for  sufficiently  large  k  .  Consider  the  two  term  expansion, 

0  =  Kg  (T[Gk])  =  Kg  (0)  ♦  M(Tk,Gk)CT[Gk]-9)  ,  (4.3) 

k  k 

where  1^-0  |  <  (TtG^]  -  0  |  ,  which  tends  to  zero  as  k  -*■  «  by 
theorem  3.1,  and  tk  is  evaluated  at  different  points  for  each 
component  function  expansion  obtained  as  a  consequence  of  Proposition  4.1 
(i.e.  rk  takes  different  values  in  each  row  of  matrix  M) .  See  from 
(4.3),  (4.1)  and  Lemma  3.1  that 

|TCGk3  -  9|  =  0(K  (0))  =  0(ek)  . 

Also, 

T[Gkl  -  0  =  -M(9)_1Kg^(0)  -  M(6)_1{M(Tk,Gk)-M(0)}(T[Gk]  -  9)  . 

By  upper  semicontinuity  of  Kg(t)  in  t  and  (4.2) 

!|M(Tk,Gk)  -  M(0)  II  =  °(i)  • 

So 

IUGkJ  -  9  -  Tp'  (Gk-F0)|  =  o(l)  0  (d*(Gk,FQ))  =  o(ek) 

0 

5.  EXAMPLES  AND  CONCLUSION 

Huber  (1964,  1981)  introduced  a  proposal  for  estimation  of  location 
and  scale  of  the  normal  distribution  defined  as  a  solution  of 

dF  (x)  =  0 

J  l  T2  j  n 

where  0  =  {(t1,t2):  -®<t1<°°,t2>o}  and  the  vector  function 
=  (t|>i , V>2 )  '  where 

ipi(x)  -  max  [-k  ,  min(k,x)) 

0 2 C x )  =  i  (x)  2  -  8(k)  , 

and  B(k)  =  Jmin(k2 ,x2)d$(x)  .  Here  $  denotes  the  normal  distribution. 

setting  9  =  (9 1,02)  ,  where  ^  now  distinguishes  the  vector  parameter, 
it  follows  that  since  K  ^0}  is  continuously  differentiable,  the  Jacobian 
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Theorem  4.1:  Let  p(G,t)  =  | x-0 }  and  assume  conditions  A' 

hold  with  respect  to  this  functional  and  neighbourhoods  generated  by 

the  metrics  d*  on  G  .  Suppose  for  all  G  65 

*(x,e)d(G-FQ)(x)  =  0(d*(G,Fg) )  (4.1) 

R 


Then  T[<|>,p,-]  is  Frechet  differentiable  at  F0  with  respect  to 
(G,d*)  and  has  derivative 


(G-Fq)  =  -M(6) 


-1 


x,0)d(G-F0)(x)  . 


To  prove  the  theorem  it  is  necessary  to  introduce  the  following 
generalization  of  the  mean  value  result  described  as  Proposition  2.6.5 
in  F.H.  Clarke  (1983) 

Proposition  4.1  Let  f  be  Lipschitz  on  an  open  convex  set  U  in 

x 

E  and  let  t2  and  t2  be  points  in  U  .  Then  one  has 
f(Ti)  -  f(T2)£co  af (Cxi  ,t23)  (t2-t i) 

(The  right  hand  side  above  denotes  the  convex  hull  of  all  points  of  the 
form  Z(t2-tj)  where  Z  €3f(u)  for  some  point  u  in  [ti,t2]  • 

Since  [co  3F([tj ,t2]) ](t2-Tj)  =  co[ 3f ([ tj , t2 ]) (t2-T]  )  ]  ,  there  is  no 
ambiguity .) 

Proof  of  Theorem  4.1:  Abbreviate  T[<Ji,p,.]  =  T[ .  ]  and  let 

<*,e  be  given  by  Theorem  2.  Let  {e^}  so  that  +  0  as 

k  -*■  »  and  let  {G^}  be  any  sequence  such  that  G^Sn(e^,FQ)  .  By 

theorem  2,  T[G,  ]  exists  and  is  unique  in  U  *(9)  for  k  >  k0  where 
K  k 

e,  s  e  .  By  A/  see  that  for  arbitrary  5  >  0 
k0  4 

3Kg  (t)  c  9Kp  (t)  ♦  6Brxr 
k  9 


uniformly  in  t  S  D 


(4.2) 
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Proof  of  Theorem  3. 1  :  Since  3K_  (t)  is  upper  semicontinuous  in 

t  choose  0  <  <*  <  min(<5i,K)  such  that  xeU^*(0)  implies 

inf3K  (x)  INt«g)H  >  2a  for  all  G  gn^j.Fj) 

G v 

where  the  infimum  is  taken  over  all  matrices  M(t,G)S  3K_(x)  .  Here 

u 

,  ej  ,  and  X  are  defined  in  Lemma  3.1.  Hence 

4X(G)  =  inf3K  (0)  ||M(0,G)  j|  >  2X  . 

G 

Choose  0  <  e*  <  ej  so  that  the  following  relations  hold 

3Kg(t)  c  3Kp  (x)  +  CX/4)Brxr  by  A'4 

0 

c  3Kp  (0)  +  C^/2) Brxr  by  Proposition  2,1 

0 

C  3Kg(0)  +  XBrxr  by  \\ 

Then  for  every  M(t ,G)  0"  3Kg(t)  there  exists  an  M(0,G)0  3KG(0)  such  that 
||M(t,G)  -  M(0,G)  II  <  x  <  2X(G)  , 
whenever  G^n(e*,F9)  and  uniformly  in  t  e  U^*(9)  • 

By  Proposition  3.1  KG(.)  is  a  one-to-one  function  from  UK*(0)  onto 

Kg(Uk*(9))  and  by  Proposition  3.2  the  image  set  contains  the  open  ball 
of  radius  Xk*/2  about  KG(0)  •  The  argument  for  uniqueness  now  proceeds 
as  in  Clarke  (1983). 

4.  FRECHET  DIFFERENTIABILITY 

It  will  be  assumed  in  this  section  that  K_  (x)  has  at  least 

F0 

a  continuous  derivative  Kp  (x)  at  x  =  0  ,  which  is  denoted  M(0)  . 

0 

This  is  common  with  absolutely  continuous  parametric  families.  With  this 
restriction  Frdchet  differentiability  follows. 
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Hence  given  M(r  ,G)  &  3Kg(t)  for  igU^  (9)  there  exists  M(0,Fe)  e  3Kp  (0) 

0 

such  that 

||M(t,G)  -  M(9,Fq)||  <  2A  , 

whence  by  Proposition  3.1  |jM(t,G)j|  >  2A  . 

It  is  now  possible  to  state  and  prove  the  uniqueness  argument  of  Theorem 
3.1  of  Clarke  (1983)  using  weakened  conditions  A'  .  The  result  also 
implies  existence  of  a  weakly  continuous  root  for  either  L?vy  or 
Prokhorov  neighbourhoods.  As  usual  the  following  selection  functional  is 
only  used  as  an  auxilliary  device. 


Theorem  3,1:  Let  p(G,z)  =  j  t—0 |  and  suppose  conditions  A'  hold. 

Then  given  k  >  0  there  exists  an  e  >  0  such  that  G  en(e,F.) 
implies  T[<J»,p,G]  exists  and  is  an  element  of  U^(0)  .  Further 
for  this  £  there  is  a  k*  >  0  such  that 
r(K»,G)  n  0.*(e)  =  T[<p  ,p  ,G]  , 

and  3K^(t)  is  of  maximal  rank  for  t  eU  +(0)  .  For  any  null  sequence 
of  positive  numbers  {£n}  }  be  an  arbitrary  sequence  for  which 

Gk  n(-':k’F0)  *  Then 

lim  T[  vp ,  p  ,  G^  ]  =  T['J»,p,F0]  =  0  . 
k-*» 


tv 
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